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Abstract 

We propose a criterion for optimal parameter selection in coarse-grained models of proteins, 
and develop a refined elastic network model (ENM) of bovine trypsinogen. The unimodal density- 
of-states distribution of the trypsinogen ENM disagrees with the bimodal distribution obtained 
from an all-atom model; however, the bimodal distribution is recovered by strengthening inter- 
actions between atoms that are backbone neighbors. We use the backbone-enhanced model to 
analyze allosteric mechanisms of trypsinogen, and find relatively strong communication between 
the regulatory and active sites. 
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A major challenge of molecular biology is to understand regulatory mechanisms in large 
protein complexes that are abundant in multi-celluluar organisms. To make simulation of 
such complexes computationally feasible, coarse-grained models have been developed, in 
which a subset of the atoms in the complex are used to simulate the large-scale motions. 
However, principled methods to quantify and optimize the accuracy of coarse-grained models 
are currently lacking. 

In one common coarse-graining method, an all-atom model is simplified by considering 
effective interactions among a subset of the atoms (e.g., just the alpha-carbons). The usual 
criterion for model accuracy is the ability of a model to reproduce atomic mean-squared 
displacements (MSDs). However, MSDs are just one aspect of protein dynamics - a stricter 
criterion for the accuracy of a coarse-grained model is the similarity between the configura- 
tional distributions of the selected atoms in the coarse-grained and all-atom models. Such 
a criterion is also biologically relevant, in part because the conformational distribution is a 

n 

key determinant of protein activity 1]. 

One useful measure of the difference between conformational distributions is the Kullback- 
Leibler divergence Dy^ (see definition below) . Recently, an analytic expression for 
was obtained for harmonic vibrations of a protein-ligand complex both with and without 
a protein-ligand interaction [3]. Here we show how an equivalent expression may be ap- 
plied to refine a coarse-grained model of protein dynamics. To use the expression for Dy^ 
requires the marginal probability distribution of a subset of the atoms in a protein, which 
we calculate in the harmonic approximation. We then apply the equations to refine an 
anisotropic elastic network model (ENM) of trypsinogen dynamics with respect to an 
all-atom model calculated using CHARMM ^5J- The unimodal density-of-states distribution 
of the ENM disagrees with the bimodal distribution obtained from the all-atom model; how- 
ever, the bimodal distribution is recovered by strengthening interactions between atoms that 
are backbone neighbors. Finally, the backbone-enhanced elastic network model (BENM) is 
used to analyze allosteric mechanisms of trypsinogen, revealing relatively strong communi- 
cation between the regulatory and active sites. 

Let P(x) be the probability distribution of the 3A^ atomic coordinates x = 
(xi, 2:1, . . . , Xtv, -2^7v) of a molecular model in the harmonic approximation. Let 
X = (xi,X2), where xi is the SA'^i coordinates of a subset of atoms of interest, and X2 is 
the 3A^2 coordinates of the remaining atoms. We are interested in calculating the marginal 
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distribution -P(xi): 

P(xi)= |rf3A^^XiP(xi,X2). (1) 

We now calculate -P(xi) in a model of molecular vibrations. Consider a harmonic approx- 
imation to the potential energy function f/(x), where x is the deviation from an equilibrium 
conformation Xq: 

f/(x + xo)^f/(xo) + ^xtHx. (2) 

The matrix H is the Hessian of U evaluated at xq: -f^ylxo = d'^U / dxidxj\^f^. We assume a 
Boltzmann distribution for -P(x), and ignore solvent and pressure effects: 



x+Hx 



Invtxl^ 3Af 



P(x) = Z-^e^^ = {2'KkBT)~^^/^e ^"bt JJuJi, (3) 

1=1 

where Z is the partition function, ks is Boltzmann's constant, T is the temperature, the 
elements of the matrix |f2|^ = diag{ujf, . . . ,^73^) are the eigenvalues of H, and the columns 
of the matrix V are the eigenvectors of H. To calculate P(xi) we define the submatrices 
Hi, H2, and G as follows: 

/ 



Hx 



(4) 



V 



Hi G xi ) Hixi + Gx2 
Gt H2 j 1^ X2 j \ Gtxi + H2X2 

Hi couples coordinates from xi; H2 couples coordinates from X2; and G couples coordinates 
between xi and X2. Eq. (jH)) now can be expressed as 
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P(x) = Z-^e^^ = ^ \{uji, (5) 

i=l 

where the diagonal elements of the matrix |A|^ = diag(Ai, . . . , A|^J and the columns of 

the matrix U are the eigenvalues and eigenvectors of H2, and the diagonal elements of the 

- 2 - 
matrix O = diag(c(}^, . . . ,ci;|jvi) ^"^^ the columns of the matrix V are the eigenvalues and 



eigenvectors of a matrix H defined as 

H = Hi - GH2^G"f = v|f7pV^ (6) 

Eq. (jni) is equivalent to an equation independently derived to study local vibrations in the 
nucleotide-binding pockets of myosin and kinesin Performing the integral in Eq. 
leads to the desired equation for P(xi): 

I . 1 2 

- OVTxi SA?"! 

P(xi) = (27rA;BT)-3^i/2e [] a;,. (7) 

2=1 



Now consider the problem of optimal selection of the parameters F of a coarse-grained 
model of protein dynamics. Let be the coordinates of the Na alpha-carbons in an an 
all-atom model, and x^'"^ be the same coordinates in the coarse-grained model. We define the 
optimal coarse-grained model as the one for which the Kullback-Leibler divergence between 
P^^^Xa) and P{xa) is minimal, i.e., for which T is chosen such that 

^S = /^^^"x.P(^nx.)ln^^^ (8) 

is minimal. We previously calculated an analytic expression for equations like Eq. (jH} when 
P(x„) and P*-^^(xa) are both governed by harmonic vibrations P]: 
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^{r) = y{i^^ + vjAx^o +-T 
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(9) 

i=i \ ^'^a^ ^ j=l OJl' ' 

In Eq. ©, OJl ' and v,^^ are the eigenvalue and eigenvector of mode i of the coarse-grained 
model; ujf and Vj are the i*^ eigenvalue and eigenvector of the matrix H calculated for the 
alpha-carbon atoms of the all-atom model (Eq. ©), and Ax^^o = x^j q — x^^o is the difference 
between the equilibrium coordinates of the coarse-grained and all-atom models. An optimal 
coarse-grained model of harmonic vibrations is thus one with parameters V such that D'^^ 
calculated using Eq. Q is minimal. 

n 

In the ENM interacting alpha-carbon atoms are connected by springs aligned with 
the direction of atomic separation. Following the Tirion model of harmonic vibrations 
each spring has the same force constant 7. For a given interaction network, the eigenvectors 
V are independent of 7, and each eigenvalue u^P'^ is proportional to 7. The value of 7 at 
which D'^lj^ is minimal may be calculated using Eq. Q: 

1 -iNc 3Afa ,7,2 

7=3^EE^vWS . (10) 

The proportionality constants of = oj^^ /7 are determined from the eigenvalue spectrum 
calculated using an arbitrary value of 7 (because the eigenvalues ujf''''^ are proportional to 
7, the constants of are independent of 7). It is easily shown that the third and fourth terms 
of Eq. cancel when 7 assumes the value given by Eq. (jlOj) . 

The interaction network in an elastic network model is generated by enabling interactions 
only between pairs of atoms separated by a distance less than or equal to a cutoff distance Tc- 
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To optimize the model, the value of Tc for which D^^ is minimal is numerically estimated, 
using values of 7 from Eq. (fTUjl . 

As a test case for optimization, we developed a coarse-grained model of bovine trypsinogen 
from an all-atom model (223 amino acids obtained from PDB entry 4TPI jsl). CHARMM 
was used for all-atom simulations using the CHARMM22 force field with default parameter 
values. HBUILD was used to generate hydrogen positions, and the energy was initially 
minimized using 2000 steps of relaxation by the adopted basis Newton-Raphson method, 
gradually reducing the weight of a harmonic restraint to the crystal-structure coordinates. 
The final minimized structure was obtained through vacuum minimization until a gradient 
of 10"'' Kcal/molA was achieved, and the Hessian H was calculated in CHARMM. The 
coordinates of the elastic network model were taken from the alpha-carbon coordinates of 
the minimized all-atom model. 

The alpha-carbon vibrations of the all-atom model were calculated by diagonalizing H 
from Eq. (jH)). Interestingly, the distribution of the density-of-states for the vibrations is 
bimodal (Fig. with 2/3 of the frequencies in the low- frequency spectrum and 1/3 of the 
frequencies in the high-frequency spectrum. Calculation of the density-of-states distribution 
from other globular proteins yields bimodal patterns with a similar 2:1 ratio between the 
numbers of low- and high-frequency modes (unpublished results). 

The best elastic network model of trypsinogen was obtained using a cutoff distance Tc of 
approximately 7.75 A, for which the optimal value of 7 is 53.4 Kcal/molA^, yielding a value 
of = 312.9 in a sharp minimum with respect to r^.- The density-of-states distribution 
for the elastic network model is unimodal, unlike that for the all-atom model (Fig. QJ. 

Although the ENM treats all alpha-carbon pairs equally, the distribution of distances 
between successive alpha-carbons along the protein backbone is known to be tightly centered 
about 3.8 A. In addition, two of the six alpha-carbons nearest to a typical alpha-carbon are 
backbone neighbors, which might explain why 1/3 of the CHARMM-derived modes have 
significantly higher frequencies than the others. We therefore wondered whether the ENM 
might be improved by enhancing interactions between backbone neighbors. 

Indeed, a more accurate coarse-grained model is obtained by using a force constant en- 
hanced by a factor of e for interactions between alpha-carbons that are neighbors on the back- 
bone. Minimization of Dy^^ for such a backbone-enhanced elastic network model (BENM) 
with respect to e and Tc subject to Eq. (fTUj) yields a model with e = 42, = 10.5 A, 
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FIG. 1: Density-of-states distribution for all-atom and elastic network models of trypsinogen. 
Frequency units are (Kcal/mol mp)^/^ = 2.04 x 10^^ Hz, where nip is the proton mass. Densities 
were estimated by counting the number of modes in bins of width 0.2, and normalizing the integral 
to 663, which is the total number of non-zero modes. The ENM {dotted blue) does not reproduce 
the bimodal distribution from the all-atom model {solid red); however, the BENM recovers the 
bimodal distribution {dashed green). 

and 7 = 4.26 Kcal/molA^, resulting in a much lower value D^^ = 102.3. The density-of- 
states distribution for this model agrees quite well with that of the all-atom model (Fig. P), 
especially considering that the model is optimized with respect to -Dx^? which does not di- 
rectly involve the density-of-states distribution. The agreement is especially good for the 
high-frequency modes, suggesting that a uniform force constant is a reasonable approxi- 
mation for interactions between alpha-carbons that are backbone neighbors. Furthermore, 
the overlap J2iLiJ2jLi l^t'^^'^jl'^ /N for the 223 highest-frequency modes is 0.99, indicating 
that the spaces of the high-frequency eigenvectors are nearly identical between the BENM 
and all-atom models. In contrast, the low-frequency distribution of BENM states is nar- 
rower than that of the all-atom model, indicating that a uniform force constant is a poorer 
approximation for interactions between alpha-carbons that are not backbone neighbors. 

Both the BENM and the ENM yield patterns of alpha-carbon MSDs that are simi- 
lar to that of the all-atom model (Fig. 12)). Because there are fewer low- frequency BENM 
modes than low-frequency CHARMM modes (Fig. the BENM MSDs are consistently 
smaller than the CHARMM MSDs; however, the BENM MSDs may be improved by se- 
lecting 7 = 1.2 Kcal/molA^ (Fig- E))- These improved MSDs come at the cost of a higher 
value of = 528.4, and a change in the frequency scale by a factor (1.2/4.3)^/^ = 0.53, 
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FIG. 2: Mean-squared displacements of alpha-carbon positions for trypsinogen residues 10-229 ob- 
tained from normal-modes simulations using CHARMM [dashed green), a BENM with parameters 
that minimize with respect to CHARMM (dotted blue), the same BENM but with 7 adjusted to 
better agree with CHARMM MSDs (fine-dotted magenta), and an ENM with parameters adjusted 
to agree with CHARMM MSDs {dash-dotted cyan). Values were calculated at T = 300 K using the 
Equipartition Theorem. Harmonic vibrations at thermal equilibrium are known to inadequately 
model crystallographic MSDs, which include other sources of disorder {solid red) joj. 



resulting in a poor model of the density-of-states distribution. The ENM with parameters 
that minimize exhibits poor MSDs (not shown); however, an ENM with = 15.4 A 
and 7 = 0.4 Kcal/molA^ yields MSDs that agree well with those of the CHARMM model 
(Fig. 12). In agreement with previous results using the ENM ^, we confirmed that the 
parameters of both the ENM and BENM may be adjusted to yield a reasonable model of 
crystallographic MSDs (not shown). 

Next consider the problem of quantifying allosteric effects in proteins P| . In allosteric reg- 
ulation, molecular interactions cause changes in protein activity through changes in protein 
conformation. Although the importance of considering continuous conformational distribu- 
tions in understanding allosteric effects was recognized by Weber theories of allosteric 
regulation that consider continuous conformational distributions have been lacking. We be- 
gan to develop such a theory by defining the allosteric potential as the Kullback-Leibler 
divergence between protein conformational distributions before and after ligand bind- 
ing, and by calculating changes in the conformational distribution of the full protein-ligand 
complex in the harmonic approximation j^. Here we use the expression for the marginal 
distribution in Eq. ((Tj) to calculate an equation for the allosteric potential in the harmonic 
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approximation, and apply it to analyze allosteric mechanisms in trypsinogen. 

Let Xp be the protein coordinates selected from the coordinates x of a protein-ligand 
complex. -P'(^p) -P(^p) the protein conformational distributions with and without 
a ligand interaction. Eq. ((7j) enables -P'(xp) to be calculated from the full conformational 
distribution -P'(x) of the protein-ligand complex. The equation for the allosteric potential 
in the harmonic approximation follows from the theory developed in ref. 

In Eq. u'"^ and are the i^^ eigenvalue and eigenvector of the matrix H calculated for 
the protein atoms of the protein-ligand complex, ujf and Vj are the eigenvalue and eigenvector 
of mode i of the apo-protein, and Axp o = Xp g — Xp_o is the difference between the equilibrium 
coordinates of the protein with and without the ligand interaction. The term J2^^i \n.Q[/uJi 
is proportional to the change in configurational entropy of the protein releasing the ligand, 
and the term Y3i_^i0jf vjAxp^o /2/c_bT is proportional to the potential energy required to 
deform the apo-protein into its equilibrium conformation in the protein-ligand complex. 

We used Eq. (ITT|) to calculate changes in the configurational distribution of local regions 
of trypsinogen upon binding bovine pancreatic trypsinogen inhibitor (BPTI). BPTI binds in 
;he active site and exerts an allosteric effect, enhancing the affinity of trypsinogen for Val-Val 



IjJ. Alpha-carbon coordinates for 223 residues were obtained from a crystal structure of 
trypsinogen in complex with BPTI (residues 7-229 from PDB entry 4TPI 8|, including theo- 
retically modeled residues 7-9), and were used directly to construct backbone-enhanced elas- 
tic network models of apo-trypsinogen and the trypsinogen-BPTI complex. As suggested by 
the refined trypsinogen model above, both models used Tc = 10.5 A, 7 = 4.26 Kcal/molA^, 
and e = 42. 

Local changes in the conformational distribution of trypsinogen were analyzed by consid- 
ering changes in the neighborhood of each alpha-carbon atom. A neighborhood was defined 
by selecting the atom of interest plus its five nearest neighbors, and the matrix H was calcu- 
lated for these six atoms in the models both with (yielding H') and without (yielding H) the 
BPTI interaction. A local value of Dx was obtained using the eigenvalues and eigenvectors 
of H' and H in a suitably modified version of Eq. PHI . 

Not surprisingly, we found that the local values of were relatively large in the neigh- 
borhood of the BPTI-binding site (Fig. El left panel). Values of Z)x elsewhere on the surface 
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FIG. 3: Visualization of local sites on the surface of trypsinogen that exhibit a large change in 
the conformational distribution upon binding BPTI. Values of Z)x are mapped to a logarithmic 
temperature scale, with red coloring indicating large values. Changes are large both in the BPTI- 
binding site (left) and in the Val-Val binding site (right). There is a 90° rotation about the x-axis 
between the left and right panels. 

were smaller, with one interesting exception: values in the Val-Val binding site were com- 
parable to those in the BPTI-binding site (Fig. El right panel). 

We also calculated local values of for the Val-Val interaction, which causes the crystal 
structure of trypsinogen to resemble that of active trypsin 0, [l^ . We found that values 
were relatively large in the neighborhood of Ser 195, which is the key catalytic residue for 
trypsin and other serine proteases: the value of in this neighborhood was 40*^ highest 
of 223 residues in the crystal structure; 11**^ of all residues not directly interacting with the 
Val-Val in the model; the highest of all residues located at least as far as Ser 195 is from 
the Val-Val ligand; and greater than that for 20 of 60 residues located closer to the ligand. 
Calculations for both the BPTI interaction and the Val-Val interaction therefore indicate 
that there is a relatively strong communication between the regulatory and active sites of 
trypsinogen. 

Considering models beyond the ENM and BENM (and even models beyond proteins), 
the theory presented here leads to a general prescription for modeling harmonic vibrations 
using coarse-grained models of materials. To optimally model the all-atom conformational 
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distribution, always use an energy scale for interactions that eliminates the discrepancy due 
to differences in the eigenvectors (Eq. ()10|) ). and select the coarse-grained model for which 
the entropy of the conformational distribution is the largest (first term of Eq. Q). 



Although traditional e 
and dynamics of proteins 



astic network models can explain characteristics of the functions 



isl ] , the present study shows that they provide a poor approxima- 
tion to the conformational distribution calculated from all-atom models of harmonic vibra- 
tions of proteins. Model accuracy is significantly improved by using a backbone-enhanced 
elastic network model, which strengthens interactions between atoms that are nearby in 
terms of covalent linkage. Although the backbone-enhanced model appears to accurately 
capture the high-frequency alpha-carbon vibrations of an all-atom model, the model less 
accurately captures the slower, large-scale harmonic vibrations (which in turn are known to 
poorly approximate the full spectrum of highly nonlinear, large-scale protein motions). 

We also find that the allosteric potential is a useful tool for computational analysis of 
allosteric mechanisms in proteins. Using calculations of the allosteric potential, commu- 
nication between the regulatory and active sites of trypsinogen was observed in a purely 
mechanical, coarse-grained model of protein harmonic vibrations that does not consider 
mean conformational changes or amino-acid identities, supporting prior arguments for the 
possibility of allostery without a mean conformational change [IJ]. It will be interesting 
to perform similar analyses on a wide range of all-atom and coarse-grained models of pro- 
tein vibrations, and to use more reahstic calculations of free-energy landscapes [l^ to more 
accurately model changes in protein conformational distributions. 
This work was supported by the US Department of Energy. 
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